190 research outputs found

    Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources

    Full text link
    In designed experiments and surveys, known laws or design feat ures provide checks on the most relevant aspects of a model and identify the target parameters. In contrast, in most observational studies in the health and social sciences, the primary study data do not identify and may not even bound target parameters. Discrepancies between target and analogous identified parameters (biases) are then of paramount concern, which forces a major shift in modeling strategies. Conventional approaches are based on conditional testing of equality constraints, which correspond to implausible point-mass priors. When these constraints are not identified by available data, however, no such testing is possible. In response, implausible constraints can be relaxed into penalty functions derived from plausible prior distributions. The resulting models can be fit within familiar full or partial likelihood frameworks. The absence of identification renders all analyses part of a sensitivity analysis. In this view, results from single models are merely examples of what might be plausibly inferred. Nonetheless, just one plausible inference may suffice to demonstrate inherent limitations of the data. Points are illustrated with misclassified data from a study of sudden infant death syndrome. Extensions to confounding, selection bias and more complex data structures are outlined.Comment: Published in at http://dx.doi.org/10.1214/09-STS291 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comment: The Need for Syncretism in Applied Statistics

    Full text link
    Comment on "The Need for Syncretism in Applied Statistics" [arXiv:1012.1161]Comment: Published in at http://dx.doi.org/10.1214/10-STS308A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The causal foundations of applied probability and statistics

    Full text link
    Statistical science (as opposed to mathematical statistics) involves far more than probability theory, for it requires realistic causal models of data generators - even for purely descriptive goals. Statistical decision theory requires more causality: Rational decisions are actions taken to minimize costs while maximizing benefits, and thus require explication of causes of loss and gain. Competent statistical practice thus integrates logic, context, and probability into scientific inference and decision using narratives filled with causality. This reality was seen and accounted for intuitively by the founders of modern statistics, but was not well recognized in the ensuing statistical theory (which focused instead on the causally inert properties of probability measures). Nonetheless, both statistical foundations and basic statistics can and should be taught using formal causal models. The causal view of statistical science fits within a broader information-processing framework which illuminates and unifies frequentist, Bayesian, and related probability-based foundations of statistics. Causality theory can thus be seen as a key component connecting computation to contextual information, not extra-statistical but instead essential for sound statistical training and applications.Comment: 22 pages; in press for Dechter, R., Halpern, J., and Geffner, H., eds. Probabilistic and Causal Inference: The Works of Judea Pearl. ACM book

    Divergence vs. Decision P-values: A Distinction Worth Making in Theory and Keeping in Practice

    Full text link
    There are two distinct definitions of 'P-value' for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was expected under the model, such as a sum of squares or a deviance statistic. A P-value is then the ordinal location of the measure in a reference distribution computed from the model and the data, and is treated as a unit-scaled index of compatibility between the data and the model. In the other definition, a P-value is a random variable on the unit interval whose realizations can be compared to a cutoff alpha to generate a decision rule with known error rates under the model and specific alternatives. It is commonly assumed that realizations of such decision P-values always correspond to divergence P-values. But this need not be so: Decision P-values can violate intuitive single-sample coherence criteria where divergence P-values do not. It is thus argued that divergence and decision P-values should be carefully distinguished in teaching, and that divergence P-values are the relevant choice when the analysis goal is to summarize evidence rather than implement a decision rule.Comment: 49 pages. Scandinavian Journal of Statistics 2023, issue 1, with discussion and rejoinder in issue

    Connecting Simple and Precise P-values to Complex and Ambiguous Realities

    Full text link
    Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them are used to set policy. Among them are that there is human equipoise or unbiasedness in data generation, management, analysis, and reporting. These assumptions correspond to levels of cooperation, competence, neutrality, and integrity that are absent more often than we would like to believe. Given this harsh reality, we should ask what meaning, if any, we can assign to the P-values, 'statistical significance' declarations, 'confidence' intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data. By themselves, P-values and CI do not test any hypothesis, nor do they measure the significance of results or the confidence we should have in them. The sense otherwise is an ongoing cultural error perpetuated by large segments of the statistical and research community via misleading terminology. So-called 'inferential' statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations.Comment: 25 pages. Body of text to appear as a rejoinder in the Scandinavian Journal of Statistic

    Epidemiologic measures and policy formulation: lessons from potential outcomes

    Get PDF
    This paper provides a critique of the common practice in the health-policy literature of focusing on hypothetical outcome removal at the expense of intervention analysis. The paper begins with an introduction to measures of causal effects within the potential-outcomes framework, focusing on underlying conceptual models, definitions and drawbacks of special relevance to policy formulation based on epidemiologic data. It is argued that, for policy purposes, one should analyze intervention effects within a multivariate-outcome framework to capture the impact of major sources of morbidity and mortality. This framework can clarify what is captured and missed by summary measures of population health, and shows that the concept of summary measure can and should be extended to multidimensional indices
    • ā€¦
    corecore